Gene networks governing the response of a calcareous sponge to future ocean conditions reveal lineage‐specific XBP1 regulation of the unfolded protein response

Abstract Marine sponges are predicted to be winners in the future ocean due to their exemplary adaptive capacity. However, while many sponge groups exhibit tolerance to a wide range of environmental insults, calcifying sponges may be more susceptible to thermo‐acidic stress. To describe the gene regulatory networks that govern the stress response of the calcareous sponge, Leucetta chagosensis (class Calcarea, order Clathrinida), individuals were subjected to warming and acidification conditions based on the climate models for 2100. Transcriptome analysis and gene co‐expression network reconstruction revealed that the unfolded protein response (UPR) was activated under thermo‐acidic stress. Among the upregulated genes were two lineage‐specific homologs of X‐box binding protein 1 (XBP1), a transcription factor that activates the UPR. Alternative dimerization between these XBP1 gene products suggests a clathrinid‐specific mechanism to reversibly sequester the transcription factor into an inactive form, enabling the rapid regulation of pathways linked to the UPR in clathrinid calcareous sponges. Our findings support the idea that transcription factor duplication events may refine evolutionarily conserved molecular pathways and contribute to ecological success.


Generating a non-redundant transcriptome for Leucetta chagosensis
Raw sequence reads were trimmed using Trimmomatic v0.32 (1).Poor-quality bases (quality score < 3) at the start and end of the reads, as well as the first 15 bases from the start of the reads, were removed.Reads were also trimmed if the average per-base quality within a 4-base sliding window fell below 20 and if the length is < 36 bases.De novo transcriptome assembly was carried out using Trinity (2).Transcripts with 90% sequence similarity were clustered and the longest representative contigs (> 300bp) were retained.Isoforms with the highest combined IsoPct or longest length were retained for each transcript to generate a non-redundant reference transcriptome.Isoforms with zero isoform percentage (IsoPct) were removed.Quality and completeness of the assemblies were assessed using Bowtie 2 v2.3.5.1 (3), Detonate (4), TransRate v1.0.3 (5), and BUSCO v3.1.0(6).The reference transcriptome was then annotated by aligning against the UniProtKB/Swiss-Prot database (April 2020) with an e-value cut-off of 1x10 -5 .
Peptides were predicted using the Transdecoder package in Trinity and annotated by alignment against the GenBank non-redundant (nr) sequence and UniProtKB/Swiss-Prot databases with an e-value cut-off of 1x10 -5 .The top Blastp hit for each peptide was used as input into OmicsBox (BioBam, Valencia, Spain) (7) to predict gene ontology (GO) annotations.Protein domains were identified by mapping the peptide sequences against Pfam 32.0 database (8) using HMMER v3.3 (9).The top Blastx and Blastp hits in UniProtKB/Swiss-Prot database and Pfam annotations for each transcript were then used as input into Trinotate v3.2.2 (10) to generate a comprehensive annotation report.GO annotations generated in OmicsBox and Trinotate were merged to improve the annotation rate.
Predicted peptide sequences of representative sponge species were annotated against the Pfam 32.0 (8) database and were assigned to their associated GO terms based on Blastp top hits (e-value ≤1 × 10 −5 ) in UniProtKB/Swiss-Prot database.Differences in the distribution of peptides that are assigned to GO terms among sponges were visualized through PCA.Percent abundance was computed relative to the total number of predicted peptides in each species.Molecular functions that distinguish calcareans from other sponges (LDA score ≥ 2, p < 0.05) were determined using Linear Discriminant Analysis effect size (LDA-LEfSe) (20).
Orthologous gene families in the transcriptome of L. chagosensis and in the genomes or transcriptomes of other sponge species were identified using OrthoFinder (21).Intersections of orthologous groups across different species were visualized using the UpSetR package in R (22).

Differential gene expression analysis
Transcript abundance was estimated by mapping reads to the reference transcriptomes using RNA-Seq by Expectation Maximization (29) with bowtie2 alignment (3).DEGs were identified using the edgeR (30) package in R. Generalized linear model functionality for likelihood ratio testing method, which is recommended for datasets with few replicates (30), was applied.Expected counts were converted to counts per million (CPM) and only genes (n = 22 417) with > 2 CPM in at least two libraries were included in edgeR analysis.This filtering step was done to remove lowly expressed genes (< 10 counts).Genes were considered differentially expressed if upregulation or downregulation was ³ 4-fold relative to the controls with a False Discovery Rate (FDR) ≤ 0.05.Pairwise comparisons were conducted between the Present Day samples and samples subjected to the other treatments.

Sequencing and quantitation of LchaXBP1 homologs
Total RNA was extracted from the tissues of sponge individuals exposed to Present Day control and RCP 8.5 treatments (n=3 per treatment) using TRIzol reagent (Invitrogen, Waltham, MA, USA).Contaminating DNA was removed using the TURBO DNA-free Kit (Invitrogen).cDNA synthesis was carried out using the GoScript Reverse Transcriptase kit (Promega, Madison, WI, USA).Full-length and bZIP domains of the two XBP1 genes in L. chagosensis were amplified using specific primers (Table S8).Amplicons were sent to Macrogen, South Korea, for Sanger sequencing.Trimmed and aligned sequences were deposited in GenBank under the accession numbers: PP716769 (LcXBP1_1), PP627506 (LcXBP1_2), PP627507 (LcbZIP2650), and PP627508 (LcbZIP60042).
Expression levels of the two XBP1 homologs in L. chagosensis under the Present Day and RCP 8.5 treatments were estimated by quantitative PCR on a QuantStudio 3 Real-Time PCR system (Thermo Fisher Scientific).Quantitative PCR reactions included 40 cycles of activation at 95°C for 2 min, denaturation at 95°C for 15 sec, and annealing/elongation at 55°C for 1 min.Each reaction contained 10 μl of 2X GoTaq qPCR Master Mix (Promega), 0.2 μl each of 100 μM forward and reverse LcbZIP primers, 4 μl template cDNA, and 5.6 μl nuclease-free H2O.Three biological replicates and three technical replicates were used in the quantitation of each gene alongside negative controls.Primer efficiency and primer specificity were assessed using dilution curves and melt curves, respectively.The abundance of target transcripts was computed using the Pfaffl method (31).Target transcript abundances were normalized to β-tubulin expression.Statistical differences were calculated using the paired two-sample Student's t-Test.

Identification of XBP1 dimer pairs
Leucine zipper heptads of LchaXBP1, PoriXBP1, and CspXBP1 homologs were manually evaluated to identify possible dimerization pairs.Heptads (L0-L5) were grouped (gabcdef) to visualize amino acids in the a, d, e, and g positions, which regulate dimerization stability and specificity of bZIP transcription factors (Fig. S3).The complementary a « a' and d « d' interactions create a hydrophobic core that is essential for dimer stability (32) while electrostatic interactions between g « e' pair can either be attractive or repulsive (33).Attractive basic-acidic interactions include Arg « Glu and Lys « Glu while Glu « Arg, Glu « Lys, Asp « Arg, and Asp « Lys are attractive acidicbasic interactions.Glu « Glu, Glu « Asp, Glu « Gln, and Gln « Glu form acidic repulsive interactions, whereas basic repulsive interactions include Lys « Lys, Arg « Lys, Gln « Lys, Arg « Gln, and Lys « Gln (34).

Generating a reference transcriptome for L. chagosensis
We sequenced the transcriptome of L. chagosensis on the NovaSeq 6000 platform, generating an average of 19 299 672 clean 100 bp paired-end reads (Table S1).De novo transcriptome assembly rendered 248 731 total transcripts.The non-redundant transcriptome, following filtering through isoform selection and sequence clustering, is composed of 91 886 (N50 = 1 409; Ex90N50 = 2 463) transcripts (Table S2).The largest contig is 46 605 bp while the smallest contig is 300 bp long.The transcriptome has 45.94% GC content, 99.20% of all bases are covered by reads, and 87.29% of reads mapped back to the assembly.Assembled transcripts were translated into 44 538 peptides (Table S3).Ortholog benchmarking indicates that the transcriptome contains 93.40% and 90.80% of the eukaryotic and metazoan core genes, respectively (Table S2).

Gene regulatory elements in Leucetta chagosensis
An extensive complement of putative histone modifiers involved in acetylation (histone acetyltransferases (HATs) and histone deacetylases (HDACs)) and methylation (histone methyltransferases (HMTs) and histone demethylases (HDMs)) were detected in the transcriptome of L. chagosensis (Table S15 -16).Core components of the DNA methylation machinery were also identified in the assembly (Table S17).

Predicted dimerization of XBP1 homologs
Attractive acidic -basic g « e' pairs are found in the LchaXBP1_1 homodimer (2 nd heptad), LchaXBP1_2 homodimer (5 th heptad), and LchaXBP1 heterodimer (2 nd and 5 th heptads) (Fig. 5A).LchaXBP1 homologs also contain Asn in position a of the 2 nd , 3 rd , and 5 th heptads, which likely promotes both homo-and heterodimerization (33).Other heterodimerizing leucine zippers comprising any combination of the three aliphatic amino acids: Ile (e.g., LchaXBP1_1, 4 th heptad), Leu (e.g., LchaXBP1_2, 4 th heptad), and Val (e.g., 1 st heptad) in the a position have similar coupling energies (34).The prevalence of Leu in the d position also contribute to dimer stability due to the unique packing interactions of the two Leu and their neighboring amino acids (33).
Although sponges do not possess any structure resembling a functional neuron (41), they have an almost complete set of gene homologs found in mammalian synapses (11).The detection of neurodegeneration-related genes in L. chagosensis can be compared to the observed overlap of core multicellularity genes in the genome of A. queenslandica with the set of genes that are implicated in cancer (11).This highlights the possibility that genomic events linked to early animal evolution can be informative about ancestrally conserved mechanisms that drive aberrant functioning of metazoan-specific traits.The coordinated activation of neurodegeneration-associated genes with UPR components in L. chagosensis under stress may be further elucidated to point out ancestral molecular targets linking environmental etiologies to the development and progression of neurodegenerative diseases.Table S8.List of primer pairs used for sequencing and quantitation of LchaXBP1 homologs.

Fig. S3 .
Fig. S3.Outline of a-helices of two interacting leucine zippers.Amino acids in positions a and d configure the hydrophobic core (yellow).Charged residues in positions e and g generate electrostatic forces (green dashed lines).The hydrophilic surface is formed by the amino acids in positions b, c, and f (red).

Fig. S5 .
Fig. S5.Orthology of upregulated genes.Upregulated genes were classified based on their orthogroup type.Numbers indicate percent of genes in each orthogroup type relative to the total upregulated genes for each treatment.Colors denote different lineages.Assignment of orthogroup types was based on OrthoFinder (21) output.

Fig. S7 .
Fig. S7.Transcription factors in Leucetta chagosensis.Bar plot shows counts and orthology assignments of the top 10 most abundant transcription factor families.Orthology assignments were based on OrthoFinder (21) output.

Fig. S9 .
Fig. S9.Neurodegeneration-associated genes in module L5.(a) Enriched pathways linked to the UPR and neurodegenerative diseases.The bar graph represents the number of genes that are common amongst the KEGG pathways listed on the left.Genes that are exclusively associated with the UPR or linked to both the UPR and neurodegenerative diseases are colored grey.Genes that are exclusive to neurodegenerative diseases are colored blue.(b) Module L5 genes that are part of evolutionarily-conserved neural degeneration pathways (40).Expression levels of each gene are shown as TPM z-score across all treatments (low, blue; high, red).

Table S1 .
Number of raw and cleaned reads generated from transcriptome sequencing of Leucetta chagosensis under the different treatments.

Table S2 .
Assembly statistics of the Leucetta chagosensis transcriptome.Statistics at major filtering steps are shown.

Table S3 .
Number of predicted peptides and annotation rate of the Leucetta chagosensis transcriptome.

Table S4 .
Sponge species included in comparative analyses.LMA, Low Microbial Abundance; HMA, High Microbial Abundance.Sources of the predicted peptide sequences are indicated.

Table S6 .
Distinguishing molecular functions in calcareous sponges.Functions that distinguish (LDA score > 2; p-value <0.05) between calcareans and other sponge species were determined using LDA-LEfSe based on relative abundance values.Abundance was computed as percentage of peptides associated with a specific molecular function relative to the total number of predicted peptides in each species.

Table S7 .
(43) of XBP1 homologs in sponges and other organisms.Asterisks indicate sequences from Jindrich and Degnan, 2016(43).Sources of the predicted peptide sequences are indicated.

Table S13 .
Functional enrichment analysis for modules L1, L4, and L5 components.Terms with p-value < 0.01 were considered significantly enriched.

Table S20 .
Structural homologs of XBP1 dimer pairs.QMEANDisCO is a composite scoring function derived from the entire structure and per residue quality estimates (0, lowest -1, highest).

Table S22 .
(46)ing potential of XBP1 dimers to CRE DNA sequence.Polar interactions between specific residues in XBP1 dimers and cognate DNA sequence are color-coded.DNA bases in bold font are part of the 'aureobox'.Interactions in bold font are polar interactions between an XBP1 dimer and the aureobox "TGACGT".bZIP-DNAinterfaceswere identified through PDBePISA(46).